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ABSTRACT 


Fault  trees  and  Petri  nets  are  two  widely  accepted  graphical 
tools  used  in  the  safety  analysis  of  software.  Because  some  software 
is  life  and  property  critical,  thorough  analysis  techniques  are 
essential.  Independently,  Petri  nets  and  fault  trees  serve  limited 
evaluation  purposes.  This  thesis  presents  a  technique  that  converts 
and  links  Petri  nets  to  fault  trees  and  fault  trees  to  Petri  nets.  It 
enjoys  the  combinational  benefits  of  both  analysis  tools. 

Software  Fault  Tree  Analysis  and  timed  Petri  nets  facilitate 
software  safety  analysis  in  heterogeneous-multiprocessor  control 
systems.  Analysts  use  a  Petri  net  to  graphically  organize  the  selected 
software.  A  fault  tree  supports  a  hazardous  condition  with 

subsequent  leaf  node  paths  that  lead  to  the  hazard.  Through  the 
combination  of  Petri  nets  and  fault  trees,  an  analyst  can  determine  a 
software  fault  if  he  can  reach  an  undesired  Petri  net  state, 
comparable  with  the  fault  tree  root  fault,  from  an  initial  marking. 
All  transitions  leading  to  the  undesired  state  from  the  initial  marking 
must  be  enabled  and  the  states  must  be  marked  that  represent  the 
leaf  nodes  of  the  fault  tree  path. 

It  is  not  the  intention  of  this  thesis  to  suggest  that  an  analyst  be 
replaced  by  an  automated  tool.  There  must  be  analyst  interaction 
focusing  the  analyst's  insight  and  experience  on  the  hazards  of  a 
system.  This  method  is  proposed  only  as  a  tool  for  evaluation  during 
the  overall  safety  analysis. 
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I.  INTRODUCTION 


A.  SAFETY-CRITICAL  HETEROGENEOUS  SYSTEMS 

Many  military  systems  require  specialized  multiprocessors. 
Weapon  systems  and  aircraft  control  systems  are  prime  examples. 
They  have  complex  controlled  system  architectures  that  must 
operate  under  tight  timing  constraints  requiring  the  use  of  multiple 
processors  to  execute  independent  control  tasks.  System  developers 
may  include  multiple  processors  in  the  initial  prototype  design  or 
add  them  during  subsequent  upgrades.  The  system  may  consist  of 
many  identical  processors,  but  specific  control  needs  require 
heterogeneous  processors. 

"Normally  only  software  that  exercises  direct  command  and 
control  over  the  condition  or  state  of  the  hardware  components  or 
can  monitor  the  state  of  the  hardware  components  are  considered 
critical  from  a  safety  viewpoint."  [Ref.  2]  Software  that  controls  and 
monitors  systems,  such  as  missiles  or  aircraft,  is  safety-critical. 

The  use  of  software  in  safety-critical  flight  systems  in 
multiprocessing  environments  such  as  the  A6,  FI 8,  and  the  proposed 
P7  military  aircraft  leads  to  a  need  to  analyze  the  safety  of  software. 
These  intricate  multiple-mission  systems  are  susceptible  to  four 
different  types  of  software  faults.  The  first  type  is  an  undesired  or 
unexpected  event.  The  second  type  is  an  event  occurring  out  of 
sequence.  The  third  type  is  a  specified  event  failing  to  occur.  The 
last  type  is  the  magnitude  or  the  direction  of  an  event  is  wrong. 
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[Ref.  2]  The  use  of  incompletely  developed  and  analyzed  software  in 
safety-critical  systems  may  cause  the  loss  of  life,  property,  or 
environmental  harm.  Formal  analysis  methodologies  executed  at  all 
stages  of  development,  from  requirements  analysis  through 
maintenance,  help  to  reduce  this  loss. 

For  any  given  system  task  the  flow  of  execution  of  the  software 
controlling  that  task  may  span  several  processors  [Ref.  1].  Flight 
systems  execute  many  tasks  simultaneously  just  to  keep  the  craft 
airborne  and  proceeding  in  a  pilot-  or  automatically-controlled 
direction. 

B.  RISKS  OF  ERRONEOUS  SOFTWARE 

Risk  is  the  probability  of  an  accident  occurring  of  a  specified 
magnitude  over  a  given  time  period.  When  the  task,  like  flight 
control,  involves  a  risk  of  human  life  or  property,  the  analyst 
executes  an  evaluation  of  the  safety  of  the  software  in  order  to  avoid 
an  accident  or  mishap. 

The  term  "mishap"  denotes  an  unplanned  event  or  series  of 
events  that  result  in  death,  injury,  occupational  illness,  damage  to  or 
loss  of  equipment  or  property,  or  environmental.  It  includes  both 
accidents  and  harmful  exposures.  "A  mishap  can  be  thought  of  as  a 
set  of  events  combining  in  random  fashion  or,  alternatively,  as  a 
dynamic  mechanism  that  begins  with  the  activation  of  a  hazard  and 
flows  through  the  system  as  a  series  of  sequential  and  concurrent 
events  in  a  logical  sequence  until  the  system  is  out  of  control  and  a 
loss  is  produced  (the  'domino  theory')."  [Ref.  4] 


2 


Safety  is  a  concern  when  systems  are  controlling  or  releasing 
energy,  such  as  mechanical,  electrical,  or  chemical.  When  software  is 
used  in  such  systems,  safety  must  be  insured  so  the  risk  to  human 
life  is  minimum. 

To  ensure  the  safety  of  software  is  to  prevent  mishaps.  Software 
faults  may  lead  to  hazards,  and  hazards  may  lead  to  mishaps; 
therefore,  evaluation  of  safety -critical  software  events  is  vital. 

Software  alone  is  not  hazardous,  but  when  the  software  controls 
a  system,  it  becomes  as  potentially  hazardous  as  the  total  system.  A 
failure,  malfunction,  or  design  error  in  the  control  of  hardware 
components  causes  or  allows  a  hazard  to  occur. 

A  fault  is  a  software  bug.  This  may  lead  to  an  error,  a 
discrepancy  between  a  computed,  observed  or  measured  value  or 

condition  and  the  true  specified,  or  theoretically  correct  value  or 
condition.  An  error,  in  turn,  may  lead  to  a  failure:  the  termination  of 
the  ability  of  a  functional  unit  to  perform  its  required  function. 
[Ref.  5] 

Software  faults  may  result  from  incorrect  or  incomplete 

specifications  and  requirements,  leading  to  incorrect  or  incomplete 
designs,  incorrect  programming  or  coding,  or  hardware-induced 
corruption.  Hopefully,  a  thorough  testing  program  would  locate  all 

faults.  In  reality,  however,  the  many  possible  combinations  of 

sequences  makes  total  fault  detection  extremely  difficult.  Also, 
analysts  only  write  tests  against  requirements,  so  they  ma>  overlook 
incorrect  requirements. 
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In  the  past,  analysts  did  not  find  many  latent  software  faults 
until  the  prototype  was  out  in  the  field.  Safety-critical  systems 
cannot  afford  this  delay  in  fault  discovery  because  once  the  system  is 
out  in  the  field,  people,  property,  or  the  environment  may  be  at  risk. 
The  analyst  must  execute  thorough  analysis  in  the  stages  before 
system  delivery  to  help  prevent  risk. 

A  thorough  safety  analysis  is  possible  because  it  does  not  need 
to  consider  all  faults,  just  safety-critical  faults.  Safety  analysis  only 
evaluates  the  system  for  possible  faults  derived  from  the 
Preliminary  Hazard  Analysis  (PHA). 

C.  EVALUATION  OF  SOFTWARE  SAFETY 

The  evaluation  of  the  software  safety  must  trace  the  flow  of 
software  execution,  analyzing  the  sequential  and  concurrent 
operations  performed  and  determining  if  the  system  acts  to  prevent 
or  reduce  risks.  Currently,  analysts  execute  manual  analysis  using 
limited  evaluation  tools,  with  substantial  cost  and  opportunity  for 
analysis  errors.  Inaccurate  results  occur  readily  in  systems  where 
analysts  base  analysis  and  design  on  informal  discussions  between  a 
software  expert  group  and  a  system  applications  expert  group. 
Mf/iy  analysts  depend  too  much  on  "corporate  knowledge"  and  not 
enough  on  the  use  of  proven  methods  of  design,  analysis,  and 
testing.  Life-,  property-,  and  environment-critical  systems  urgently 
require  thorough  safety  analysis  in  the  software  life  cycle  to  avoid 
risk  to  life  and  property. 
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Leveson  [Ref.  7]  surveys  software  safety  in  terms  of  why,  what, 
and  how.  "A  fair  conclusion  might  be  that  'why'  is  well  understood, 
'what'  is  still  subject  to  debate,  and  'how'  is  completely  up  in  the  air." 
[Ref.  7] 

Analysts  may  combine  multiple  analysis  techniques  to  evaluate 
safety.  This  thesis  presents  one  integration  method  of  "how"— the 
combination  of  SFTA  and  Petri  net  analysis  techniques— in  this  thesis. 

D.  SOFTWARE  FAULT  TREE  ANALYSIS 

Leveson  and  Harvey  [Ref.  6]  developed  Software  Fault  Tree 
Analysis  (SFTA).  Hardware  system  analysts  use  Fault  Tree  Analysis 
(FT A)  to  analyze  a  system  in  the  context  of  its  environment  and 
operation.  They  find  credible  sequences  of  events  that  can  lead  to  a 
specified  hazard.  Leveson  and  Harvey  derived  SFTA  from  FT  A  to 
analyze  systems  containing  software  components.  The  fault  tree  is  a 
graphic  representation  of  parallel  and  sequential  combinations  of 
events  and  system  states  that  result  in  the  occurrence  of  the 
predefined  hazard.  The  events  and  states  can  be  associated  with 
component  failures,  human  errors,  or  any  other  pertinent  events  and 
states  that  can  lead  to  the  hazard.  A  fault  tree  represents  the  logical 
interrelationships  of  events  and  states  that  lead  to  the  hazard. 

E.  TIMED  PETRI  NETS 

Murata  [Ref.  8]  and  Leveson/Stolzy  [Ref.  9]  state  that  timed  Petri 
nets  describe  time-critical  events  in  multiprocessor  control 
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applications  and  determine  if  safety-critical  states  are  reachable 
during  normal  execution. 

The  analyst  models  a  system  in  terms  of  conditions  and  events 
with  Petri  nets.  "If  certain  conditions  hold,  then  an  event  or  'state 
transition'  will  take  place  resulting  in  other  (or  the  same)  conditions 
taking  place."  [Ref.  2] 

In  the  past,  analysts  mainly  used  Petri  nets  to  evaluate 
performance  and  correctness.  Researchers  are  currently  proposing 
that  analysts  can  achieve  timing  more  readily  with  Petri  nets  than 
with  fault  trees. 

F.  SFTA  AND  PETRI  NET  INTEGRATION/TRANSITION 

SFTA  and  timed  Petri  nets  are  integrated  in  this  thesis  to 
facilitate  software  safety  analysis  in  heterogeneous-multiprocessor 
control  systems.  These  techniques  are  equivalent  in  expressive 
power;  analysts  can  facilitate  both  techniques  by  allowing  one 
technique  to  use  the  information  about  the  system  expressed  by  the 
other  technique.  [Ref.  1] 

Analysts  use  a  Petri  net  to  graphically  organize  the  selected 
software  code.  A  fault  tree  explicitly  supports  a  hazardous  condition 
root  node  with  subsequent  leaf  node  paths  that  lead  to  the  hazard. 
Through  the  combination  of  Petri  nets  and  fault  trees,  an  analyst  can 
determine  a  software  fault  if  each  preconditional  node  in  an  entire 
fault  tree  path  links  to  one  or  more  Petri  net  transitions  and  states. 

Drawing  on  the  specific  A-6E  example  developed  in  McGraw’s 
thesis  [Ref.  10],  Shimeall,  McGraw,  and  Gill  [Ref.  1],  describe  a  general 
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technique  for  integrating  these  two  analysis  techniques.  It  uses  a 
semantic  model  for  information  sharing  between  the  techniques 
during  the  analysis.  This  model  consists  of  three  classes  of  objects: 
states,  transitions,  and  linkages.  The  states  contain  information  on 
conditions  existing  during  program  execution.  Transitions  contain 
information  on  actions  performed  during  program  execution,  and 
reference  the  states  that  lead  to  and  result  from  the  transitions. 
Transitions  also  include  timing  information,  indicating  the  enabling 
and  firing  times  of  the  actions,  along  with  deadlines  after  which  the 
action  stops.  The  state  references  in  the  transitions  allow  for  any 
combination  of  states.  The  linkages  contain  information  on  undesired 
events.  In  general,  the  states  and  transitions  contain  information  for 
the  generation  of  Petri  nets,  and  the  linkages  contain  additional 
information  for  the  generation  of  fault  trees. 

The  semantic  model  forms  the  basis  for  automated  support  of 
the  safety  analysis  process  by  allowing  the  analyst  to  rapidly  and 
easily  shift  between  techniques.  The  analyst  may  use  Petri  nets  to 
describe  the  system  architecture,  shift  to  fault  trees  to  describe  the 
hazards  associated  with  the  system  and  the  events  that  may  lead  to 
the  hazards,  then  shift  back  and  forth  between  Petri  nets  and  fault 
trees  to  analyze  those  events.  Each  analysis  technique  may  easily  use 
the  results  obtained  by  the  other  technique.  [Ref.  1] 

This  thesis  expands  on  the  Shimeall,  McGraw,  and  Gill  work  by 
delineating  a  stepwise  methodology  for  converting  Petri  nets  to  fault 
trees  and  fault  trees  to  Petri  nets.  Graphical,  as  well  as  tabular. 
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linkages  describe  the  Petri  net/fault  tree  relationship  and  multi-step 
conversions. 

G.  SCOPE  OF  THESIS 

The  main  research  and  development  for  this  thesis  suggests  a 
methodology  for  integrating  timed  Petri  net  analysis  and  SFTA  to 
analyze  software  system  safety.  Chapter  II  provides  an  overview  of 
the  background  information  researched  and  synthesized  with 
original  thought  to  create  the  proposed  integrated  safety  analysis 
technique.  Chapter  III  delineates  a  step-by-step  Petri  net  to  fault 
tree  and  fault  tree  to  Petri  net  conversion  and  linkage  process. 
Chapter  IV  discusses  a  summary  of  the  research  executed  to  develop 
the  proposed  integrated  technique,  the  technique  itself,  and  an 
analysis  of  its  effectiveness  as  a  safety-analysis  technique.  Chapter 
IV  also  presents  recommendations  for  further  study  in  the  field  of 
safety-analysis-technique  integration. 
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II.  CURRENT  PETRI  NET  AND  FAULT  TREE  ANALYSIS 

TECHNIQUES 


This  chapter  surveys  Petri  net  and  fault  tree  research.  Analysts 
may  use  any  of  several  analysis  techniques  for  a  safety  evaluation  of 
a  software  system.  However,  the  author  selected  Petri  nets  and  fault 
trees  for  the  proposed  integrated  method  of  safety  analysis  because 
first,  they  are  the  most  mature,  and. second,  analysts  have  used  them 
in  analysis  for  a  relatively  long  time.  Researchers  have  focused  a 
great  deal  on  these  two  graphical  representations.  Also,  the 
individual  qualities  of  Petri  nets  and  fault  trees  interleave  well  into  a 
single,  effective  analysis  technique. 

This  survey  describes  some  possible  application  areas  of  Petri 
nets  and  fault  trees,  giving  both  the  strong  and  weak  points  of  each 
analysis  technique  and  describing  a  detailed  graphical  and  textual 
representation  of  general  Petri  nets  and  fault  trees. 

A.  PETRI  NET  APPLICATIONS  AND  DEVELOPMENT 

Carl  Petri  [Ref.  11]  created  Petri  nets  in  the  1960s.  Researchers 
have  made  many  enhancements  since  then.  Murata  [Ref.  8]  surveys 
current  Petri  net  properties  and  techniques  thoroughly.  "Petri  nets 
are  a  promising  analysis  tool  for  describing  and  studying  information 
processing  systems  characterized  as  being  concurrent,  asynchronous, 
distributed,  parallel,  non-deterministic,  and/or  stochastic."  [Ref  8] 
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Petri  nets  are  a  graphical  and  mathematical  tool  that  apply  to 
many  software  systems.  Some  possible  areas  of  applications  are 
modeling  and  analysis  of  distributed-software  systems,  concurrent 
and  parallel  programs,  multiprocessor  memory  systems, 
asynchronous  circuits  and  structures,  compiler  and  operating 
systems,  and  other  discrete-event  systems.  Additional  interesting 
applications  are  local-area  networks,  neural  networks,  and  decision 
models.  [Ref.  8] 

Analysts  use  Petri  nets  to  graphically  represent  a  system, 
explicitly  reflecting  the  concurrent  or  parallel  activities  of  the 
system. 

A  Petri  net  is  graphically  represented  as  a  directed  graph  with 
two  kinds  of  nodes:  places  and  transitions.  It  is  textually 
represented  as  a  five-tuple.  The  set  of  places,  textually  represented 
as  P  [Ref.  8]  and  drawn  as  circles,  indicates  the  conditions  or  values 
present  during  program  execution.  The  set  of  transitions,  textually 
represented  as  T  [Ref.  8]  and  drawn  as  bars  or  boxes,  indicates  the 
events  that  occur  during  program  execution.  Arcs  join  the  places  and 
the  transitions  as  shown  in  Figure  2-1  [Ref.  10].  The  arcs  leading  to  a 
transition  represent  a  precondition  or  an  input  and  the  arcs  leading 
from  a  transition  represent  a  postcondition  or  an  output.  These  arcs 
are  textually  represented  as  a  flow  relation,  F  [Ref.  8],  with  weight, 
W,  indicating  what  flows  must  be  present  for  a  transition  to  occur. 
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A  marking,  the  presence  of  tokens  in  a  subset  of  the  places  in 
the  net,  indicates  the  current  state  of  the  system.  The  M0  set  in  the 

five-tuple  [Ref.  8]  represents  the  system  initial  state. 

When  a  transition  fires,  indicating  a  change  in  the  system  state, 
it  consumes  a  token  from  each  of  the  input  arcs,  and  generates  a 
token  on  each  of  the  output  places.  A  transition  leading  from  one  set 
of  places  to  another  is  enabled  to  fire  when  all  of  the  places  leading 
into  the  transition  contain  tokens  [Ref.  10]  (See  Figure  2-2  [Ref.  8]). 
Each  transition  potentially  enables  further  transitions  as  shown  in 
Figure  2-3.  If  more  than  one  transition  is  enabled  at  once,  a  non- 
deterministic  choice  is  made  as  to  which  transition  fires  [Ref.  1], 

The  importance  of  this  non-deterministic  firing  representation  is 
that  in  real  life,  events  do  not  purely  happen  sequentially.  A 
discrete  event  can  occur  at  any  time  that  all  of  its  preconditions  have 
been  met.  One  example  is  driving  a  car  from  point  A  to  point  B. 
There  are  many  mechanical  and  maneuvering  events  that  can  take 
place  between  point  A  and  point  B.  Some  events  have  a  particular 
order.  The  driver  must  turn  the  ignition  key  before  the  engine 

starts.  Many  events  can  happen  concurrently.  The  driver  may  apply 
pressure  to  the  accelerator  and  turn  the  wheel  at  the  same  time.  He 
may  execute  many  non-deterministic  decisions  while  getting  from 
point  A  to  point  B  and  Petri  nets  can  represent  these  decisions  well. 
Using  timed  Petri  nets  allows  the  incorporation  of  timing  information 
into  the  analysis.  Real-time  embedded-system  analysis  requires 
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Figure  2-1  Basic  Petri  Net  Structure 
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Figure  2-2  Basic  Petri  Net  Structure  With  Tokens  Before 


Transition 

tl 


Figure  2-3  Basic  Petri  Net  Structure  With  Tokens  After 
Transition 
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timing  analysis.  Even  basically  correct  software  actions  that  occur 
too  early  or  too  late  can  lead  to  unsafe  conditions.  [Ref.  9] 

Analysts  add  a  minimum  time  function  and  maximum  time 
function  to  the  above  five-tuple  Petri  net  description  to  define  the 
time  frame  boundaries  within  which  a  firing  can  occur. 

Safety  analysis  determines  if  the  net  can  reach  an  unsafe  state. 
A  human  can  make  an  unsafe  decision  outside  of  the  system,  such  as 
pulling  out  into  ongoing  traffic,  but  the  hardware  and  software 
system  should  not  allow  a  hazard  to  be  reachable. 

In  the  safety  analysis  [Ref.  9],  if  an  unsafe  event  (represented  by 
a  transition)  can  be  reached  from  the  initial  marking  M0,  corrections 

need  to  be  made.  If  it  is  possible  that  a  marking  will  eventually 
cause  a  transition  to  be  enabled,  the  transition  is  reachable  from  the 
marking.  Analyzing  a  system  can  be  complex,  but  the  Petri  net  aids 
in  the  visual  understanding  of  event  ordering  and  results. 

Figure  2-4  [Ref.  12]  contains  Ada  code  for  the  control  of  a  traffic 
light.  A  Petri  net  can  represent  all  system  states  and  transitions 
relating  to  the  software  algorithms.  Figure  2-5  depicts  the  timed 
Petri  net  associated  with  the  Ada  code  in  Figure  2-4. 

Table  1  describes  the  transitions  representing  events  that  could 
cause  the  hazard  of  both  the  East/West  and  the  North/South  cars 
entering  the  intersection  at  the  same  time.  These  descriptions  reflect 
the  erroneous  activity  of  the  East/West  light  while  the  North/South 
car  may  enter  the  intersection. 
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1  procedure  traffic  is 

2  type  direction  is  (east,  west,  south,  north); 

3  type  color  is  (red,  yellow,  green); 

4  type  light. type  is  array  (direction)  of  color; 

5  lights  :  light.type  (green,  green,  red,  red); 

6  task  type  sensor.task  is 

7  entry  initialize  (mydir  :  in  direction); 

8  entry  car.comes; 

9  end  sensor.task;  ' 

10  se  isor  :  array  (.direction)  of  sensor.task; 

11  task  controller  is 

12  entry  notify  (dir  :  in  direction); 

13  end  controller; 

14  task  body  sensor.task  is 

15  dir  :  direction; 

16  begin 

17  accept  initialize  (mydir  :  in  direction)  do 

18  dir  :*  mydir; 

19  end  initialize; 

20  loop 

21  accept  car.comes; 

22  if  (lights(dir)  /*  green)  then 

23  controller .notify  (dir); 

24  end  if ; 

25  end  loop; 

26  end  sensor.task; 

27  task  body  controller  is 

28  begin 

29  loop 

30  accept  notify  (dir  :  in  direction)  do 

31  case  dir  is 

32  when  east  I  west 

33  lights  :*  (green,  green,  red,  red);  delay  5.0; 

34  lights  :=  (yellow,  yellow,  red,  red);  delay  1.0; 

35  lights  :■  (red,  red,  green,  green); 

36  when  south  I  north  *> 

37  lights  :=  vred,  red.  green,  green);  delay  5.0; 

38  lights  :*  (re*  red.  yellow,  yellow);  delay  1.0; 

39  lights  :=  (green,  green,  red,  red); 

40  end  case; 

41  end  notify; 

42  end  loop; 

43  end  controller; 

44  begin 

45  for  dir  in  east . .north- loop 

46  sensor(dir) .initialize  (dir); 

47  end  loop; 

48  end  traffic; 


Figure  2-4  Ada  Code  for  Traffic  Light  Controller 
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Table  1  TRANSITIONS  LEADING  TO  HAZARDS  IN  THE 
TRAFFIC  LIGHT  CONTROLLER 


ORIGINATION 

TRANSITION 

DESTINATION 

P2  Car  at  E/W 

E/W  car  runs  light 

Pj  E/W  car  through 

intersection 

P3  E/W  light  yellow 

Light  broken- 
Stuck  on  yellow 

Pj  E/W  car  through 

intersection 

P4  E/W  light  green 

Light  broken- 
Stuck  on  green 

Pj  E/W  car  through 

intersection 

P5  E/W  light  red 

Light  broken- 

Stuck  on  red 

E/W  car  runs  light 

P4  E/W  light  green 

>  5  second  delay 
from  green  to  yellow 

P3  E/W  light  yellow 

P3  E/W  light  yellow 

>1  second  delay 
from  yellow  to  red 

P5  E/W  light  red 

P4  E/W  light  green 

>6  second  delay 
from  green  to  red 

P5  E/W  light  red 

P5  E/W  light  red 

Light  prematurely 

green 

P4  E/W  light  green 
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If  an  analyst  can  foilow  a  hazardous-condition  path  within  the 
software  code  during  Petri  net  analysis,  the  system  will  have  to  be 
modified.  Software  safety  analysts  only  seek  software  controlled 
errors. 

Time  is  critical  in  this  specific  example.  Software  developers 
determined  the  delay  times  for  light  changes  by  hard-coding  these 
changes  in  lines  32  through  39  of  the  selected  Ada  code.  If  a  light  is 
functioning,  but  is  delayed  on  green  or  yellow  too  long,  or  turns 
green  prematurely,  a  hazard  exists  and  the  software  code  needs  to  be 
corrected. 

Consider  the  last  item  in  Table  1,  the  East/West  light  turning 
prematurely  green  and  allowing  the  two  perpendicular  cars  in  the 
intersection  at  the  same  time.  The  initial  markings  for  Figure  2-5 
could  be  in  first  in  p5  indicating  the  E/W  light  is  red,  second  in  p9 
indicating  the  North/South  light  is  green,  nnd  third  in  pn  indicating 

a  car  is  at  the  North/South  intersection. 

Because  p4  can  be  reached  prematurely  while  the  North/South 

light  is  going  through  its  timing  sequence,  modifications  must  be 
made  to  the  software  system.  One  possible  scenario  that  exposes  a 
hazard  is:  1)  A  North/South  car  approaches  the  intersection  and  its 
signaling  turns  the  North/South  light  green.  2)  An  East/West  car 
approaches  the  intersection,  finds  the  East/West  light  red,  signals  for 
green,  and  waits.  3)  A  second  North/South  car  approaches  the 
intersection,  checks  the  light,  and  sees  green.  This  North/South  car 
enters  the  intersection,  but  as  it  is  entering,  the  East/West  light  turns 
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green.  The  undesired  result  is  that  one  North/South  car  and  one 
East/West  car  are  in  the  intersection  at  the  same  time. 

The  drawback  in  using  Petri  nets  for  analysis  is  that  they  are 
difficult  and  time  consuming  to  analyze.  To  generate  the  entire 
reachability  graph  from  Petri  nets  consumes  exponential  space  and 
time.  [Ref.  10].  Integrating  this  analysis  method  with  the  more 
specific  fault  tree  approach  to  analysis  may  make  evaluation  simpler 
and  potentially  save  time  and  money  while  maintaining  order, 
concurrency,  and  timing. 

B.  FAULT  TREE  APPLICATIONS  AND  DEVELOPMENT 

Researchers  created  Fault  Tree  Analysis  (FTA)  in  the  1960s  for 
analyzing  hardware.  Electrical  and/or  mechanical  systems  needed  a 
way  to  analyze  safety.  Researchers  then  created  Software  Fault  Tree 
Analysis  (SFTA)  in  the  early  1980s  to  evaluate  applications  and 
systems  software.  [Ref.  6] 

The  safety  analyst  conducts  a  Preliminary  Hazard  Analysis 
(PHA)  of  the  software  first.  PHA  seeks  to  find  potential  hazards 
involved  in  the  execution  of  a  system.  The  analyst  then  places  these 
hazards  into  severity  categories  He  then  employs  SFTA  to  point  out 
single-point  failure  modes  and  guide  further  design  in  the  most 
fruitful  direction  for  hazard  elimination  and  reduction  [Ref.  12]. 
Certification  personnel  may  use  SFTA  to  examine  already-developed 
software. 

An  undesired  event,  or  hazard,  could  occur  due  to  environmental 
conditions,  human  error,  or  component  failure.  The  fault  tree  depicts 
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the  logical  interrelationship  of  these  basic  events  that  lead  to  the 
hazard.  SFTA  works  backward  and  tries  to  prove  that  the  hazard 
cannot  be  reached.  [Ref.  6] 

A  fault  tree  developer  takes  the  root  node  and  abstractly 
establishes  sets  of  possible  conditions,  or  leaf  nodes,  that  lead  to  the 
root  node. 

"Proof  by  contradiction  is  conveniently  used  in  SFTA  since  the 
goal  of  the  analysis  is  to  prove  that  the  software  will  not  permit 
some  event."  [Ref.  12]  If  the  analysis  can  prove  a  contradiction  to  the 
loss  (root)  event  then  the  event  cannot  happen  as  a  single-point 
failure  within  the  software.  If  the  initial  software  or  system  state 
starts  a  possible  series  of  events  that  lead  to  the  root  event,  then  the 
system  developers  need  to  alter  the  software  or  system  design  to 
prevent  the  root  event  [Ref.  1]. 

Leveson  and  Harvey  [Ref.  6]  list  and  describe  the  relevant  fault 
tree  symbols  used  in  SFTA.  These  graphical  representations  are 
shown  in  Figure  2-6.  Fault  trees  are  textually  represented  by  N,  the 
set  of  nodes,  and  G,  the  set  of  'and*  and  'or'  gates. 

One  possible  hazard  description  resulting  from  a  PHA  done  on 
the  high-level  Ada  code  in  Figure  2-4  [Ref.  12]  is  that  both  a  car  from 
the  North/South  direction  and  a  car  from  the  east/west  direction  can 
enter  the  intersection  at  the  same  time.  Figure  2-7  portrays 
probable  scenario  nodes  that  could  lead  to  this  root  node. 
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The  rectangle  indicates  an  event  to  be  analyzed  further. 


The  circle  indicates  a  basic  fault  event  or  primary 
failure  of  a  component.  It  requires  no  further 
development,  and  its  probability  of  occunence  is 
derived  from  the  generic  rale  of  the  part. 


The  house  is  used  for  events  which  normally  occur  in 
the  system.  It  represents  the  continued  operation  of  the 
component,  and  its  probability  is  the  reliability  of  the 

part. 


The  diamond  is  used  for  non-prima!  events  which  are 
not  developed  further  for  lack  of  information  or 
insufficient  consequence. 


OTlie  oval  is  used  to  indicate  a  condition.  It  defines  the 
state  of  die  system  that  permits  a  fault  sequence  to 
occur.  It  may  be  normal  or  result  from  failures. 


Q 

0 

Figure  2-6 


The  AND  gate  serves  to  indicate  that  all  input  events 
are  required  in  order  to  cause  the  output  event. 


The  OR  gate  indicates  that  one  or  more  of  the  input 
events  are  required  to  produce  the  gated  events. 


Fault  Tree  Symbols 
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Figure  2-7  Possible  Fault  Tree  for  Traffic  Light  Controller 
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For  fault  tree  development  and  evaluation  for  other  hazards,  the 
analyst  should  expand  on  the  remainder  of  the  root  nodes  derived 
from  the  PHA  in  order  of  priority. 

A  spurious  yellow  or  green  light  being  displayed  in  the 
North/South  direction  while  a  green  light  is  displayed  in  the 
East/West  direction  are  two  conditions  that  would  support  the 
possibility  of  two  cars  coming  from  perpendicular  directions  being  in 
the  intersection  at  the  same  time. 

Figure  2-8  [Ref.  12]  expands  on  Figure  2-7  by  detailing  and 
tailoring  the  fault  tree  to  examine  the  traffic-light  program  for  the 
presence  of  spurious  light  conditions  that  enable  two  perpendicular 
cars  to  be  in  an  intersection  at  the  same  time.  The  selected  Ada  code 
determines  the  timing  of  the  light  changes  in  lines  32  through  39. 
System  developers  hard-coded  delay  times  for  light  changing. 

Because  the  high-level  code  allows  at  least  one  path  from  the 
initial  system  state  to  the  hazard,  the  Ada  code  supports  the  fault 
tree  and  is  hazardous. 

"The  above  fault  tree  analysis  demonstrates  that  the  above  Ada 
code  could  contribute  to  the  hazard  (i.e.,  two  cars,  traveling  at  right 
angles  to  one  another,  are  present  in  the  intersection  at  the  same 
time)  if  two  successive  rendezvous  occur  with  the  east  or  west  sensor 
tasks  and  the  north  sensor  task  checks  the  state  of  the  lights 
immediately  prior  to  the  second  rendezvous".  [Ref.  11]  Eliminating 
this  hazard  requires  changes  to  the  code  or  controller. 
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Figure  2-8  Fault  Tree  Reflecting  Ada  Code  for  Traffic  Light 
Controller 
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SFTAs  are  limited  as  the  only  tool  used  for  analyzing  the  software 
safety  of  a  system.  Fault  trees  are  a  static  analysis  technique. 
Timing  analysis  requires  dynamic  analysis.  [Ref.  7]  "They  can, 
however,  detect  software  logic  errors  and  multiple  failure  sequences 
that  may  have  essential  information  that  can  be  shared  with  Petri 
net  analysis".  [Ref.  10] 

C.  INTEGRATING  ANALYSIS  TECHNIQUES 

The  software  system  safety  research  community  has  established 
integrating  multiple  system  reliability  analysis  techniques  over  the 
past  few  years.  Researchers  developed  one  technique,  the  Hybrid 
Automated  Reliability  Predictor  (HARP)  [Ref.  13]  in  1986.  It 
integrates  fault  tree  notation  with  the  Markov  Chain.  "HARP 
converts  the  dynamic  fault  tree  model  notation  into  a  Markov  Chain 
and  solves  the  Markov  Chain  using  a  standard  well  known  numerical 
integration  algorithm."  [Ref.  14] 

For  many  applications,  analysts  may  use  either  software  fault 
trees  or  timed  Petri  nets  to  evaluate  safety  critical  behaviors  of  the 
control  software  [Ref  1].  Petri  nets  explicitly  model  the  structure  of  a 
control  system  and  the  events  during  the  execution  of  the  control 
system.  The  semantics  of  those  events  and  the  resulting  conditions 
are  only  represented  abstractly.  In  fault  trees,  the  semantics  of  the 
conditions  and  events  are  explicitly  described,  but  the  structure  that 
gives  rise  to  those  events  is  dealt  with  abstractly  [Ref.  1]. 

Shimeall,  McGraw,  and  Gill  [Ref.  1]  argue  for  an  integration  of 
these  two  analysis  techniques  that  gives  the  analyst  different  views 
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of  the  system.  Analysts  should  not  consider  Petri  nets  and  fault 
trees  to  be  alternate  techniques,  but  complimentary  to  one  another. 

Leveson  and  Stolzy  [Ref.  9]  use  Petri  nets  and  FTA  in  conjunction 
with  one  another.  Developing  a  complete  reachability  graph  from  the 
Petri  nets  representing  a  system  is  time  consuming  and  difficult; 
however,  the  evaluation  is  simpler  when  Petri  nets  only  describe  the 
key  portions  of  a  system.  [Ref.  9]  The  safety  analyst  only  needs  to 
consider  life-,  property-,  or  environment-critical  portions  of  the 
high-level  code  in  the  analysis. 

The  SFTA  system  state  information,  represented  as  the 
preconditional  leaf  nodes  to  a  specific  root  fault  discovered  in  the 
PH  A  reduces  the  massive  nature  of  Petri  net  analysis  [Ref.  10].  Fault 
trees  address  one  specific  undesired  event  at  a  time,  breaking  the 
system  analysis  down  into  multiple  discrete  safety  issues. 

McGraw  [Ref.  10]  analyzes  a  real-time  software  example  that  is 
the  upgrade  of  the  A-6E  operational  flight  program  (OFP  240).  China 
Lake  controls  and  directs  the  new  program,  OFP  250.  McGraw 
represents  a  key  portion  of  this  system  with  a  Petri  net  and  a  fault 
tree. 

Shimeall,  McGraw,  and  Gill  [Ref.  1]  define  the  semantics  of  a 
linkage  relation  for  a  Petri  net  and  a  fault  tree.  They  construct  the 
semantic  model  shown  in  Table  2  by  joining  the  separate  formal 
descriptions  presented  previously  and  adding  the  linkage  relation. 
This  linkage  represents  the  logical  relationship  between  fault  tree 
nodes  with  Petri  net  places  and  transitions.  Textually  there  are  three 
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Table  2  PETRI  NET,  FAULT  TREE,  AND  SEMANTIC  FORMAL 


DESCRIPTIONS 


Timed  Petri  Nets: 
tpn  =  <P%T,r,W,E.D,Mu> 


P  - 

T  s=  {fj,<2,— , 

F  C  (P  x  T)  U  (T  x  P) 

W-.F-+  {1,2,3,...} 

E  =  {ei,e2,...,ei} 

D—  {dj,  c^2, ...» th) 

M0  :P-*  {1,2,3,...} 


places 
transitions 
flow  relation 

weight  (tokens  on  each  flow) 
enabling  times 
{leadline  times 
initial  marking 


Fault  Trees: 
ft  =<N,G,S,C\R> 

N  =  {«J,»l2,  Hj} 

C  -  {pi, 02 >..•,£’>} 

5  =  {sj ,  S2i  •••,  f>} 

C  C  (A:  x  N) 

Re  N 


nodes  {fault/failure  statements) 
gates  (logical  connections) 
shapes  (analysis  role) 
child  relation 
root  node 


Sema.ntic  Model 
am  ss  <  L,  t]m,fi  > 

L  C  [P  U  T)  x  C  x  A'  linkage  relation 


Constraints 

P  n  T  =  0.  P  u  7V  S) 

Vi,l  <  i  <  k.  c,  >  0  A  d,  >  0  A  (l,  >  f,- 
Vi,  1  <  i  <j,g,  €  {and, or. mill} 

Vi,l  <  i  <j,s,  €  {box, house. diamond,  circle, oval) 

|C|=;-"l 
Vi,  1  <  i  <  j , 

(n,  ^  P  |{(ji,,»,)  €  C».U  <  q  <  j  A  q  ^  «'}{=  1)A 

(n,  =  P  =>  |{(n„n,)  €  O./.l  <  <,  <  j  A  q  *  «}|-  0) 
Vy,y  €  (PuT),Vn,n€  A\ 

(3y,0  €  {and,or, null}, (y,0,n)  €  L)  => 

J{0  €  {an'. or,null}s. /.(?/. 0.«)  €  /.}|=  I 
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elements:  1)  the  union  of  Petri  net  P,  places,  and  T,  transitions,  2)  G, 
the  linkage  containing  'and'  ,  'or',  or  'null'  gates,  and  3)  the  fault  tree 
N,  nodes  that  define  a  Petri  net  fault  tree  linkage. 

Chapter  III  expands  on  this  work  by  describing  the  conversion 
from  a  Petri  net  to  a  fault  tree,  the  conversion  from  a  fault  tree  to  a 
Petri  net,  and  the  formal  linkage  relationship  between  the  two.  Once 
the  analyst  identifies  the  places  and  transitions  that  may  lead  to  the 
root  event,  he  incorporates  them  into  the  linkage  relation  [Ref.  1]. 

Integrating,  converting,  and  linking  Petri  net  and  fault  tree 
analysis  techniques  in  order  to  evaluate  selected  software  code  may 
reduce  the  development  and  maintenance  efforts  by  reducing 
redundant  analysis. 
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III.  ANALYSIS  TECHNIQUE  INTEGRATION 


A.  INTRODUCTION 

This  thesis  presents  an  analysis  technique  that  integrates 
software  fault  tree  analysis  (SFTA)  and  timed  Petri  nets  to  facilitate 
software  safety  analysis  in  heterogeneous-multiprocessor  control 
systems.  The  combination  of  Petri  nets  and  fault  trees  in  software 
safety  analysis  provides  a  greater  convenience  than  using  either 
individually. 

The  purpose  for  integrating  Petri  nets  and  fault  trees  is  to 
enhance  software  safety  analysis.  The  integrated  technique  melds 
the  use  of  the  Petri  net  representation  of  system  events  and  the 
explicit  fault  representation  and  diagnosis  in  fault  trees  for  a 
synergistic  effect. 

The  example  analyzed  here  proves  that  the  design  of  a  change  in 
the  flight  control  system  of  the  A-6  fighter/bomber  prevents  an 
important  hazard,  inadvertent  missile  launch  during  practice 
[Ref.  10].  The  Petri  net  in  Figure  3-1  (taken  from  McGraw  [Ref.  10]) 
represents  a  high  level  control  flow  of  the  proposed  upgrade  to  the 
Grumman  A-6E  Operational  Flight  Program  (OFP  240). 

The  analyst  must  make  a  decision  whether  to  begin  the  safety 
analysis  with  Petri  net  or  fault  tree  development.  The  analyst 
creates  the  Petri  net  to  organize  and  partially  order  the  events  that 
occur  during  system  execution.  This  visual  representation  of  events, 
via  transitions  and  states,  simplifies  the  the  analyst’s  understanding 
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of  individual  software  algorithms  and  their  interaction  for  those 
system  evaluators  inexperienced  or  experienced  with  computer 
programming.  When  analysts  pictorially  organize  activity  paths,  it  is 
easier  to  point  out  system  problems  that  may  lead  to  hazards.  This  is 
especially  true  when  there  is  concurrency  or  timing  of  action. 
Leveson  and  Stolzy  [Ref.  9]  explain  Petri  net  development.  A 
conversion  from  a  Petri  net  to  a  fault  tree  and  a  linkage  between  the 
two  is  described  in  Section  B.  The  analyst  establishes  a  fault  tree  to 
Petri  net  graphical  and  tabular  linkage,  as  in  Figure  3-2,  while  he  is 
creating  the  fault  tree  from  a  Petri  net. 

The  analyst  executes  fault  tree  development  first  in  the 
proposed  conversion  and  integration  method  if  the  system  chosen  for 
analysis  is  independent,  synchronous,  localized,  serial,  deterministic, 
or  non-stochastic.  The  analyst  tailors  a  fault  tree  to  its  top  event  that 
corresponds  to  some  particular  system  hazard,  detailing  the  events 
and  conditions  that  lead  to  the  hazard.  [Ref.  1] 

B.  PETRI  NET  TO  FAULT  TREE  CONVERSION  AND 

INTEGRATION 

The  dynamics  of  Petri  nets  aids  the  analyst  in  the  understanding 
and  evaluation  of  complex  systems.  However,  Petri  nets  only 
abstractly  represent  the  semantics  of  the  events  and  conditions. 
Should  the  analysis  require  explicit  representation  of  the  semantics, 
the  analyst  may  choose  to  convert  the  Petri  net  to  a  fault  tree  and  to 
pursue  the  analysis  utilizing  fault  tree  techniques.  This  section 
details  the  steps  used  in  Petri  net  to  fault  free  conversion. 
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Figure  3-2  General  Petri  Net  and  Fault  Tree  Graphical  and 
Tabular  Linkage 
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1.  Petri  Net  to  Fault  Tree  Conversion  Initiation 

Initiate  a  conversion  from  a  Petri  net  to  a  fault  tree  b  y 
choosing  a  root  fault  derived  from  the  Preliminary  Hazard  Analysis 
(PHA)  of  the  high  level  code.  The  example  root  fault,  nj  (Practice 

Command  Causes  Actual  Effect),  in  Figure  3-3,  is  one  of  several  root 
faults  that  could  result  from  a  PHA  on  OFP  240. 


11 1 


Practice  Command 
Causes  Actual 
Effect 


Figure  3-3  OFP  Root  Fault 

2 .  Petri  Net  and  Fault  Tree  Starting  Point  Link 

The  selected  root  fault  provides  a  starting  point  for  the  link 
between  the  Petri  net  and  fault  tree.  Once  the  analyst  has  selected 
the  root  fault,  he  associates  it  with  the  set  of  Petri  net  places  and 
transitions  that  may  immediately  lead  to  the  root  fault.  For  the  OFP 
240  example  the  fault  tree  root  fault  condition  node,  nj  (Practice 

Command  Causes  Actual  Effect)  in  Figure  3-2  corresponds  to  the 
resultant  transition,  t16  (Command  Executed),  in  Figure  3-1.  The 

effect  being  analyzed  is  the  firing  of  the  weapon  prompted  by  the 
executing  command.  While  working  backward  in  the  Petri  net,  the 
analyst  develops  and  supports  subsequent  fault  tree  nodes  by 
working  downward  in  the  fault  tree,  as  delineated  in  the  next  step. 
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3 .  Petri  Net  to  Fault  Tree  Graphical  and  Tabular  Linkage 

Link  the  fault  tree  to  the  Petri  net  by  working  backward  in 
the  Petri  net.  This  is  done  to  see  if  the  initial  marking  of  the  Petri 
net  is  reached  by  linking  new  nodes  in  the  fault  tree  to  the  Petri  net 
places  and  transitions  they  relate  to.  Figure  3-4  exemplifies  a 
graphical  cross  linking  of  a  portion  of  the  related  existing  Petri  net 
and  newly  created  fault  tree  segments  respectively.  Note  that  the 
fault  tree  node  n2  (AIU  Executes  Live  Command)  also  relates  to  the 
Petri  net  transition  t16  (Command  Executed(STA)).  This  node  and 

transition  both  indicate  the  execution  of  the  command  to  fire  the 
weapon.  The  later  fault  tree  node  n5  (Command  Sent  From  ACU) 
corresponds  directly  to  the  Petri  net  transition  t14  (Pulse  Propagated) 
as  the  analyst  works  back  up  the  Petri  net.  Two  fault  tree  nodes,  n9 
(Command  Enabled)  and  n10  (Arm  Switch  Signal  On)  are 
preconditions  to  n5  The  fault  tree  node  n9  links  to  the  Petri  net 
transition  t9  (Command  Ready).  The  other  node  required  before  a 
command  is  sent,  n10  (Arm  Switch  Signal  On)  ,  is  analogous  to  the 
PNj  place  pj  5  •  P15  indicates  that  a  firing  pulse  is  ready  to  for 
propagation  due  to  the  prerequisite  condition  of  the  landing  gear 
being  up. 

This  linkage  can  also  be  represented  in  tabular  notation  as 
shown  in  Figure  3-5,  reflecting  the  Petri  net  to  the  fault  tree 
relationship.  Once  a  Petri  net  and  fault  tree  are  initially  created  and 
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linked,  each  fault  tree  has  one  or  more  linked  corresponding  portions 
of  the  Petri  net  identified  in  the  Petri  net  fault  tree  (PNFT)  Linkage. 

4 .  Complete  Development  of  the  Petri  Net  Fault  Tree 

Linkage  Table 

Indicate  the  appropriate  type  of  gate  'and',  ’or',  or  ’null’  in  the 
PNFT  Linkage  table.  Use  'and'  or  'or'  when  two  or  more  Petri  net 
places  and  transitions  relate  to  a  single  fault  tree  node  and  a  null 
when  a  single  Petri  net  place  or  transition  relates  to  a  single  fault 
tree  node.  Figure  3-6  shows  the  gate  relationships  for  the  example 
segment.  Sometimes  the  table  does  not  reflect  a  complete  linkage. 

Figure  3-7  represents  the  fully  developed  fault  tree. 

5 .  Remedy  If  No  Path  to  the  Unsafe  Event  is  Exposed. 

If  it  is  not  possible  to  work  back  to  the  beginning  of  the  Petri 
net,  exposing  a  potential  path  to  the  unsafe  event,  do  further  analysis 
and  take  at  least  one  modification  measure.  Execute  this 
modification  by  expanding  the  fault  tree,  adding  conditions,  or 
making  an  assumption  and  seeing  where  the  assumption  leads  to  in 
the  Petri  net. 

The  software  design  itself  may  prevent  a  thorough  backtrack, 
reaching  the  root  fault,  from  occurring  even  if  there  is  a 
single-component  failure.  This  prevention  may  be  determined  by 
comparing  the  conditions  expressed  in  the  fault  tree  with  those 
represented  by  the  initial  marking  of  the  Petri  net. 

If  it  is  necessary  to  extend  the  fault  tree,  integrate  the  nodes  into  an 
expanded  fault  tree  FTE.  An  existing  fault  tree  leaf  may  become  a 
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Figure  3-5  OFP  Basic  PNFT  Link  Table  Without  Gates 
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Figure  3-7  OFP  Fault  Tree 


37 


FTe  predecessor  to  the  newly  created  nodes  or  a  new  expanded  node 
lineage  stemming  from  the  root  node  may  develop  as  the  analysis 
proceeds. 

The  evolved  FTE  thus  becomes  the  basis  for  a  FTE  to  execute 
Petri  net  PNE  conversion.  Execute  the  conversion  from  the  FTE  to  a 
PNe  by  linking  the  additional  FTE  nodes  to  their  associated  PNE  places 
and  transitions. 

If  the  Petri  net  has  not  changed  except  to  reflect  the  FTE, 
merely  collate  the  links.  Otherwise,  establish  the  links  to  the  new 
places  or  transitions  in  the  Petri  net  and  collate  the  links. 

Simultaneously  create  an  expanded  Petri  net  fault  tree  PNFTE 
Linkage  reflecting  the  relationships  between  the  FTE  and  the  PNE. 
Determine  their  associated  type  of  gates  ‘and’,  ’or’,  or  ’null’.  Execute 
this  cyclic  conversion  as  many  times  as  necessary  during  analysis. 

C  FAULT  TREE  TO  PETRI  NET  CONVERSION  AND 

INTEGRATION 

Fault  trees  are  are  well  suited  to  initially  describe  a  system  that 
is  ordered  or  deterministic.  During  the  analysis,  should  concurrency 
or  timing  issues  arise,  the  analyst  may  elect  to  proceed  using  Petri 
net  techniques.  This  section  details  the  steps  involved  in  converting 
fault  trees  to  Petri  nets. 

Figure  3-8  shows  the  'and'  and  'or'  relationships  of  transitions 
and  places  in  Petri  net  analysis. 
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1 .  Fault  Tree  and  Petri  Net  Starting  Point  Link 

The  root  fault  of  the  fault  tree  provides  a  starting  point  for 
the  development  of  a  partial  Petri  net  and  link  between  the  fault 
tree  and  the  Petri  net.  Figure  3-9  depicts  the  fault  tree  root  fault 
condition  node,  n}  (Practice  Command  Causes  Actual  Effect)  as 
creating  pj  of  the  Petri  net.  The  hazard  being  analyzed  is  the  firing 
of  a  weapon  during  a  simulated  weapon  use.  While  working  top- 
down  in  the  fault  tree  develop  subsequent  transitions  and  places  in 
the  Petri  net,  as  delineated  in  the  next  step. 

Event  1 


Event  2  And 

Figure  3-8  Petri  Net  ’and’  and  ’or'  Gates 

2 .  Fault  Tree  to  Petri  Net  Graphical  and 
Tabular  Linkage 

Work  downward  in  the  fault  tree  to  create  the  Petri  net.  For 

each  node  in  the  tree,  create  a  place  in  the  Petri  net  to  represent  the 

condition.  Use  transitions  to  represent  the  combinations  indicated  by 
the  gates  in  the  fault  tree.  Create  linkage  elements  with  null  gates  to 
represent  the  relationships.  In  Figure  3-9  the  fault  tree  node  n2 

(AIU  Executes  Live  Command)  creates  p2  This  condition  indicates 

the  execution  of  the  command  to  fire  the  weapon.  Fault  tree  node  n5 
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(Command  Sent  From  ACU)  creates  p3,  the  propagation  of  a  pulse. 
Two  fault  tree  nodes,  n9  (Command  Enabled)  and  n10  (Arm  Switch 
Signal  On)  are  preconditions  to  n5  Fault  tree  node  n9  (Command 
Enabled)  creates  p4,  command  ready  state.  The  other  node  required 
before  a  command  is  sent,  fault  tree  node  nj0  (Arm  Switch  Signal 
On). creates  p5  P5  indicates  that  a  firing  pulse  is  ready  for 
propagation  due  to  the  prerequisite  of  the  landing  gear  being  up. 

Once  the  analyst  creates  a  Petri  net  and  links  it  to  the  fault 
tree,  the  net  must  be  augmented  before  any  analysis  may  be 
performed.  The  next  step  guides  this  augmentation. 

3.  Petri  Net  Completion 

The  initial  Petri  net  generated  from  the  fault  tree  will  be 
incomplete.  Fault  trees  only  contain  events  and  conditions 
specifically  related  to  the  root  fault  and  omit  all  other  functional 
behavior.  This  omitted  behavior  must  be  added  to  the  Petri  net 
before  Petri  net  safety  analysis  can  proceed. 

Leveson  and  Stolzy  [Ref.  9]  describe  how  a  Petri  net  may  be 
derived  from  a  software  system.  In  this  case,  the  derivation  is 
focused  on  the  gaps  in  the  initial  Petri  net.  As  the  Petri  net  is 
completed,  the  analyst  may  need  to  combine  and  rearrange  nodes 
generated  from  the  fault  tree.  As  this  is  done,  the  linkage  relation 
must  be  modified  to  reflect  the  change.  Further  inspection  of 
analysis  may  reveal  that  the  added  portions  of  the  Petri  net  relate  to 
portions  of  the  fault  tree.  In  that  case,  the  analyst  must  include 


41 


these  connections  in  the  linkage  relation,  modifying  ’null'  gates  to 
’and’  or  'or'  gates  as  needed. 

The  derived  and  expanded  Petri  net  may  then  be  used  for 
safety  analysis  as  described  by  Leveson  and  Stolzy  [Ref.  9].  The 
results  of  this  analysis  lead  to  the  generation  of  new  fault  tree  nodes 
as  described  in  Section  B. 
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IV.  SUMMARY  AND  CONCLUSIONS 


A.  INTEGRATED  ANALYSIS  TECHNIQUE 

The  work  described  in  this  thesis  integrates  Software  Fault  Tree 
Analysis  (SFTA)  and  timed  Petri  nets  to  facilitate  software-safety 
analysis  in  heterogeneous-multiprocessor  control  systems.  The 
integrated  technique  uses  Petri  nets  for  ordering  events  and 
conditions  across  multiple  processors,  explicitly  representing 
concurrent  and  sequential  events.  This  allows  for  simpler  fault  tree 
analysis,  or  use  of  techniques  such  as  those  of  Leveson  and  Stolzy 
[Ref.  9],  who  describe  Petri  net  analysis  for  concurrency  errors  and 
time-related  errors. 

The  recording  of  analysis  logic,  explicitly  representing  the 
semantics  of  the  events  and  conditions  that  lead  to  a  hazard,  requires 
SFTA.  Petri  nets  explicitly  show  combinations  of  events  and 
conditions  but  the  semantics  behind  those  events  and  conditions  are 
abstracted  away.  The  integrated  technique  uses  fault  trees  for 
analyzing  non-sequential  (or  non-local)  sequences  of  conditions  that 
lead  to  a  hazard,  where  several  parts  of  the  system  must  coordinate 
for  the  hazard  to  occur.  This  part  of  the  analysis  is  derived  from  that 
of  Leveson  and  Harvey. 

A  stepwise  methodology  is  presented  for  converting  a  fault  tree 
into  a  Petri  net  or  converting  a  Petri  net  into  a  fault  tree,  then  linking 
the  two  together  for  an  integrated  analysis.  The  order  of  technique 
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usage  is  dependent  on  the  particular  software  being  analyzed. 
During  Petri  net  to  fault  tree  conversion,  the  analyst  establishes  a 
fault  to  a  Petri  net  linkage  while  he  is  creating  the  fault  tree  from  the 
Petri  net.  Inversely,  he  establishes  a  Petri  net  to  fault  tree  linkage 
while  he  is  creating  the  Petri  net  from  the  fault  tree  during  fault  tree 
to  Petri  net  conversion. 

The  resulting  integrated  linkage  relation  eases  the  iterative 
conversion  between  the  two  analysis  techniques.  This  relation  links 
information  in  one  representation  to  fields  of  the  other 
representation. 

B.  LESSONS  LEARNED 

This  thesis  uses  a  formal  basis  to  the  integrated  analysis 
technique.  This  formal  basis  is  useful  as  it  clarifies  information 
content  in  the  unions  representations  and  provides  a  vocabulary  to 
discuss  conversion  and  linkage.  An  initial,  informal,  conversion 
sketch  is  created,  then  formalized  and  restructured  around  the 
formalization. 

Multiple  views  of  analysis  information  are  stressed.  The 
multiplicity  of  views  gives  the  analyst  a  broader  view  of  the  analysis 
process,  allowing  him  to  both  detect  more  subtle  problems  in  the 
software  and  to  identify  problems  in  the  analysis  itself.  The 
multiplicity  occurs  across  two  dimensions:  presentation  of  technique 
(graphic  and  textual)  and  technique  usage  (Petri  nets  or  fault  trees). 
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Graphic  and  textual  views  are  essentially  equivalent  and 
complimentary.  The  graphic  view  provides  detail  and  connection 
focus. 

The  integration  of  SFTA  and  Petri  net  analysis  also  gives  the 
analyst  two  complementary  views  of  the  system.  The  use  of  the 
system's  organizational  representations  in  the  Petri  net  and  the 
explicit  fault  representation  and  diagnosis  in  the  fault  tree  are 
melded  for  a  synergistic  effect.  The  Petri  net  provides  organization 
and  emulation,  whereas  the  fault  tree  provides  combination  and 
logic. 

C.  FUTURE  WORK 

The  first  recommendation  stemming  from  the  work  done  in  this 
thesis  is  to  automate  the  integrated  software-safety  analysis  method 
presented.  An  analyst  should  not  be  replaced  by  an  automated  tool. 
An  analyst  interaction  focusing  the  analysts  insight  and  experience 
on  the  hazards  of  a  system.  This  method  is  proposed  only  as  a  tool 
for  evaluation  during  the  overall  safety  analysis.  Tools  exist  for  Petri 
net  (P-Nut)  and  fault  tree  (SFTAT)  analysis.  A  means  to  tie  these 
tools  together  is  suggested  in  this  thesis. 

Other  modeling  techniques  that  lend  themselves  well  to 
integration  need  to  be  explored  (such  as  PHA,  Markov  Chains,  and 
FMEA).  Individually,  safety-analysis  techniques  have  weaknesses 
that  may  be  better  addressed  by  other  techniques.  The  integration 
of  two  or  more  analysis  methods  may  well  help  to  reduce  life, 
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property,  and  environmental  losses  due  to  hazard-inducing  software 
by  utilizing  their  combined  advantages. 

D.  CONCLUSIONS 

Software  faults  must  not  cause  hazards  after  a  system  is  in  the 
field.  Loss  of  life  or  property  can  result  from  hazardous  software 
used  in  real-life  environments.  Analysts  should  make  every  effort  to 
find  all  safety-critical  software  faults  before  the  developer  delivers 
the  sysiem  to  the  users.  Integrated  analysis  methods  should 
enhance  early  fault  discovery  by  focusing  on  the  key  safety-critical 
portions  of  the  software  and  avoiding  redundant  analysis. 

Further  research  in  all  areas  of  software-safety  is  required  to 
prevent  hazards  as  early  as  possible  in  the  development  and 
maintenance  life-cycle  of  people-,  property-,  or  environment-critical 
software. 
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